Exploiting A Large Data Base By Longman
نویسندگان
چکیده
We wish to explore some of the aspects of the exp lo i t a t i on of two d i c t i o n a r y f i l e s by LONGHAN Ltd, one for ' c o r e ' [mglish and one for Imglish idioms. We'l l t r y to show the f e a s i b i l i t y o f an approach to language process ing based on a lexicon, conceived of as the r epos i to ry of grammatical, semantic and knowledge-of-theworld information. Af te r giving a b r i e f de sc r ip t i on of the computer :fi les (Section I) w e ' l l focus on the following points : a) a l ex ica l approach to granmar allows a considerable s i m p l i f i c a t i o n of the PSG component o f a pars ing system (Section I I , Part One)~ b) the s y n t a c t i c po t en t i a l of many lexemes (at surface s t ruc tu r e level) can serve as a guide to t h e i r deep s t ruc tu r e conf igura t ions (Section I I s P a r t Two)j c) provided tha t a d i c t i o n a r y makes use of a l imi ted def in ing vocabulary, the t ex t s of the d i c t i ona ry d e f i n i t i o n s can be processed on the bas is of co r r e l a t i ons between s y n t a c t i c s t ruc tu res ( f i l l e d with individual lexenms or lexemes belonging to spec i f i ab l e c lasses ) and semantic r e l a t i onsh ips such as tha t between a process verb and an instrument (Section I I I ) .
منابع مشابه
تحلیل فضایی ـ زمانی مدیریت مخاطرات آنتروپوژنیکی معادن در ایران
The appearance of Hazards in human life is affected by natural and human forces. So far, human beings were the most powerful stimulant to create these hazards and to intensify them. The negative role of human beings in environment is caused by factors like lack of knowledge, weak reaction, technology lack, aggressive ideologies and competition; in social system, however, human behavioral engine...
متن کاملTowards A Dictionary Support Environment For Realtime Parsing
Hiyan Alshawi, Bran Boguraev, Ted Briscoe Computer Laboratory, Cambridge University Corn Exchange Street Cambridge CB2 3QG, U.K. In this article we describe research on the development of large dictionaries for natural language processing. We detail the development of a dictionary support environment linking a restructrured version of the Longman Dictionary of Contemporary English to natural la...
متن کاملاستخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملA Class-based Approach to Word Alignment
This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. However, words that are less frequent or exhibit diverse translations generally do not have statistical...
متن کاملThe Economies of Scale in Iran Manufacturing Establishments
One of the topics after two decades of applying import substitution policy in Iran manufacturing sector is the importance of industrial export expansion and foreign relations. The main impetus to this policy transfer is the market expansion and potential gains of exploiting the economies of scale and technical upgrades. Based on this argument this research estimates the efficient scale and gain...
متن کاملLarge Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE
This article focusses on the derivation of large lexicons for natural language processing. We describe the development of a dictionary support environment linking a restructured version of the Longman Dictionary of Contemporary English to natural language processing systems. The process of restructuring the information in the machine readable version of the dictionary is discussed. The Longman ...
متن کامل